Analysis of aggregated tick returns: evidence for anomalous diffusion 
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In order to investigate the origin of large price fluctuations, we analyze stock price changes of ten 
frequently traded NASDAQ stocks in the year 2002. Though the influence of the trading frequency 
on the aggregate return in a certain time interval is important, it cannot alone explain the heavy 
tailed distribution of stock price changes. For this reason, we analyze intervals with a fixed number 
of trades in order to eliminate the influence of the trading frequency and investigate the relevance of 
other factors for the aggregate return. We show that in tick time the price follows a discrete diffusion 
process with a variable step width while the difference between the number of steps in positive and 
negative direction in an interval is Gaussian distributed. The step width is given by the return due 
to a single trade and is long-term correlated in tick time. Hence, its mean value can well characterize 
an interval of many trades and turns out to be an important determinant for large aggregate returns. 
We also present a statistical model reproducing the cumulative distribution of aggregate returns. 
For an accurate agreement with the empirical distribution, we also take into account asymmetries 
of the step widths in different directions together with crosscorrelations between these asymmetries 
and the mean step width as well as the signs of the steps. 



The mechanics of stock price changes were studied al- 
ready more than a hundred years ago, when Bachelier 
modelled price movements as a diffusion process with 
Gaussian fluctuations 0. However, empirical studies 
show that the distribution of returns has heavy tails 
HSSSISS&UHmHEl, meaning that events 
with large price changes are much more probable than 
in a Gaussian distribution. In addition, the functional 
form of the distribution stays similar if the return is ag- 
gregated on very different time scales from seconds to 
months, approximating a Gaussian distribution only if 
the time scale becomes very large 0, • 

These findings would sug gest that stock returns have a 
Levy stable distribution |3j 0) Il3 • I n a Levy flight, 
the second moment would be divergent and extreme re- 
turns aggregated over a long time would be determined 
by very large price jumps on smaller time scales. How- 
ever, empirical studies find evidence that the tail of the 
return distribution follows a power law with exponent 
around a — I = 3 so that it does not a gree with the stable 
Paretian hypothesis 0&HE0ElllEHIIl|2ll. 

The cause of the fat tails is currently a subject of great 
interest [H IH |H Farmer et al. find that 

the distribution of returns due to a single trade (tick re- 
turns) is similar to the distribution of returns aggregated 
on longer time scales with the same tail exponent |25|. 
Although the tail exponent is outside the Levy regime 
< a — 1 < 2, they argue that similar to a Levy flight 
both distributions are caused by the same microscopic 
mechanism, so that large aggregate returns are due to 
single exceptionally large tick returns. Plerou et al. de- 
scribe the price movements as a diffusion process with a 
fluctuating diffusion constant and relate the distribution 
of aggregate returns to the distribution of the variance of 
the tick returns Ha . 
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In the present paper, we investigate the transition from 
tick returns to returns aggregated in intervals with a 
larg er number of trades. It is well documented (e.g. in 
|27l l28| ) that the number of trades in a time interval is 
an important determinant of the aggregate return. How- 
ever, the trading frequency alone cannot account for the 
observed fat tailed distribution of aggregate returns [26( . 
Thus, we remove the direct influence of the trading fre- 
quency by analyzing intervals with a constant number of 
trades so that effects due to other quantities like the tick 
return size are more clearly visible. 

We study how each aggregate return is actually built 
from the basic quantities involved in the process, and 
thus examine the mechanism leading to large price fluc- 
tuations. According to the central limit theorem, inde- 
pendent tick returns would in aggregation lead to Gaus- 
sian distributed returns. However, we find that the tick 
return size is long-term correlated in tick time (com- 
pare HHimiSmHIIlS 113), so that the 
conditions of the central limit theorem are not fulfilled. 
Thus, the mean tick return size can well characterize an 
interval of many trades and its fluctuations lead to the 
non-Gaussian behavior of the aggregate return. In this 
picture, large aggregate returns do not occur because of a 
few very large tick returns, but rather when the average 
tick return is large, so that even Gaussian fluctuations in 
the direction of the trades can lead to aggregate return 
larger than in a Gaussian distribution. 

The remainder of this paper is organized as follows: 
section I shows our model for the price diffusion pro- 
cess, in section II we describe the data set used for this 
study, section III shows the influence of the tick return 
size on the aggregate return while section IV focusses on 
the influence of differences in the direction of tick returns 
(number difference) . Section V compares the number dif- 
ference and the flow of market orders and in section VI 
we present a statistical model which approximates the 
distribution of aggregate returns. We conclude with a 
discussion of our results in section VII. 



2 



I. MODEL 

We study intervals with a fixed number of N = 100 
trades. If the price of a stock before the i-th trade is Sj, 
we define the return due to a single trade, the tick return, 
as 

Sgi = ln(s i+ i) - ln(si) . (1) 

The interval Ij contains all N trades with index i be- 
tween jN and (j + 1)N, so the aggregate return Gj is 
given by the sum over all Sgi with ielj : 

G,=J2 S 9* ■ ( 2 ) 

ielj 

We want to discuss two special cases in order to ana- 
lyze the mechanism leading to large aggregate returns 
Gj. In the first case, Gj is dominated by one (or a few) 
extremely large Sg™ ax , so that 

G 3 =8gZ ax + £ Sg^Sg^r . (3) 

ielj ,i^io 

Thus, Gj becomes large if 5g™ ax is exceptionally large. 

In the second case, we assume that there is no ex- 
tremely large tick return dominating the aggregate re- 
turn, so that we focus on the average size Agj of the 
non-zero tick returns, which is defined by 

A 9j = ^- E Ifol ( 4 ) 

Here, rij is the number of Sgi =/= in the interval Ij. 
Neglecting assymetries in the Sgi , we can replace all Sgi ^ 
by sign(6gi)Agj and approximate the aggregate return 
by 

GjaAgj E signfe) = AgjANj (5) 

where ANj — J2s gi ^o iei sign(<5gi) is called number dif- 
ference. Similarly, Gj can be described as a diffusion 
process with 

(G|) « DjN (6) 

where the diffusion constant Dj = 77- Agj varies due to 
the varying step width Agj and the number rij of non- 
zero tick returns. 

In the approximation given by Eq. we can study the 
influence of the mean size of the tick returns as well as 
asymmetries in their direction. A large aggregate return 
can occur if the price moves more often in one direction 
than in the other. Thus, with large temporary corre- 
lations between the signs, even small tick returns could 
compose a large Gj. On the other hand, if Agj is larger, 
even a small asymmetry in the signs can lead to a large 
return. 



The two approximations given in Eq. |3| and Eq. [3] are 

analyzed in sections III and IV of this paper, but in sec- 
tion VI we also consider the error term neglected in Eq.^l 
An exact formulation writes 

2ti~^ti 

Gj = AgjANj + -^-(Afft - Agj) (7) 
no- 
where Ag^ and Agj are the average tick returns in posi- 
tive and negative direction while rij and rij are the num- 
bers of non-negative tick returns in positive and negative 
direction. 



II. DATA ANALYSIS 

We analyzed order book data of the year 2002 from Is- 
land ECN for the ten most frequently traded stocks [2a| • 
Since the Island ECN is a secondary market where only 
part of the whole stock volume is traded, we also studied 
the index fund QQQ which was mainly traded via Island 
until September 2002. Since our results for the ten stocks 
and QQQ are similar, we find no evidence that secondary 
market characteristics of Island affect our analysis nega- 
tively. 

In an electronic market place like Island, people can 
place limit orders to buy or to sell at a given or better 
price (limit price), which is specified in the order. These 
orders are stored in the order book and they are only 
executed when the actual stock price reaches the limit 
price. A trade is initiated by a market order indicating 
that someone wants to buy or sell immediately at the 
best available price. Such a market order executes the 
limit orders offering the best prices until the number of 
shares specified in the market order is traded. 

Our dataset contains information about every limit or- 
der so that we are able to reproduce the market situation 
at each instant of time. We combine those limit order 
executions with identical time stamps as they reflect the 
same market order. Therefore, we can analyze the impact 
of each single market order on the price. In this analysis, 
the price is defined as the mid-quote price which is the 
mean of the best available buy (bid) and sell (ask) limit 
prices (quotes). We study intervals with a fixed num- 
ber of N = 100 market orders and have approximately 
100,000 intervals in our dataset for ten stocks. Thus, 
on average a 100 trade interval corresponds to about 10 
minutes, but the trading frequency fluctuates strongly 
so that 100 trades can correspond to time intervals with 
very different lengths. 

We determine the mid-quote price Si just before the 
execution of the i-th market order. Since most trades 
change the price just by the size of the gap between the 
best and the second best limit price [23, the tick re- 
turn Sgi corresponds to the gap size. We note that the 
price can (and often does) change between two consec- 
utive market orders due to placement or cancellation of 
limit orders so that Sgi does not provide a direct estimate 
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FIG. 1: (Color online) Five largest price changes (a) 5g™ ax+ 
and (b) Sg™" ax ~ due to a single trade with (a) the same and 
(b) the opposite sign as the aggregate return in that 100 tick 
interval, plotted against the rank of the corresponding aggre- 
gate return \Gj\ for the combined data of ten Nasdaq stocks 
in 2002 (smoothed by averaging over 100 intervals) . For large 
the size of the 5gJ lax+ increases by a factor of two while 
the increase in the 8gJ lax ~ is slightly smaller. The sum over 
all five 5gJ lax+ reaches more than 3 standard deviations for 
intervals with extremely large \Gj\, but the fluctuations in the 
opposite direction are almost equally large. 



of the gap size. We normalize the tick returns Sgi by the 
standard deviation of the aggregate return Gj for each 
stock individually so that we can combine the results for 
different stocks. 



III. INFLUENCE OF THE SIZE OF TICK 
RETURNS 



First, we investigate the question whether large tick 
returns caused by large gaps in the order book can be 
responsible for large aggregate returns. To this end, we 
start with the approximation shown in Eq. [3] where a 
few extremely large tick returns (corresponding to some 
very large gaps in the order book) lead to a very large 
aggregate return Gj. In order to test this hypothesis, 
we analyzed the five largest tick returns bg™ axJr with the 
same sign as the aggregate return Gj (i.e. the five largest 
positive 5gi if Gj > and the five largest negative Sgi for 
Gj < 0) in each time interval. To this end, we sort the 
intervals by \Gj\ and plot the 8gJ' ax+ against the rank 



in inter- 



of the interval according to its return \Gj\. 

Fig. Ufa) shows the values of these dg™ ax+ 
vals with small Gj w on the left while the values for 
large returns exceeding five standard deviations can be 
found on the right. Since there are large fluctuations 
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FIG. 2: Density plot of the 100-trade-return \Gj \ of ten Nas- 
daq stocks against the average return of a single trade Agj 
for each interval. The Points are coded from light grey to 
black indicating the number of events from 1 to more than 
500. A linear regression has only a small correlation coeffi- 
cient R 2 = 0.07. 



in the data, we smoothed the curves by averaging over 
100 intervals. The bg™ axJr grow by a factor of two be- 
tween small and very large returns \Gj\. When aggre- 
gated, these five largest 8g™ ,ax+ can reach about three 
standard deviations, which is almost half of the largest 
aggregate returns. 

In Fig. Gib), we plot the five largest tick returns 
figmax- w jth the opposite direction as the aggregate re- 
turn against their rank. The 5g™ ax ~ behave similarly 

to the 5g" lax+ , though the increase for large aggregate 
returns is slightly weaker. However, even for the largest 
aggregate returns the difference between the dgj lax+ and 
8g™' ax ~ is rather small, so that there are also large tick 
returns reducing the aggregate return. 

Our results suggest that large aggregate returns are not 
the result of single exceptionally large tick returns since 
very large tick returns occur in both directions and cancel 
each other out. In the following, we want to focus not 
on the extreme tick returns, but on the influence of their 
mean value. More precisely, we analyze Eq. [5] and the 
mean tick return Agj of all non-zero \5gi\ in the interval 



Ij as defined in Eq.0] A density plot of \Gj\ against Agj 
is shown in Fig. |5J It seems that extremely large returns 
Gj correspond to larger average tick returns Agj , but the 
broad distribution suggests that the explanatory power 
of Agj alone for the aggregate return Gj is small, which 
is confirmed by the low correlation coefficient R 2 = 0.07 
of a linear regression. 

In order to clarify the relation between the extreme 
values of \Gj\ and Agj, we sort the intervals by \Gj\ 
and plot Agj against the rank of the interval according 
to its return \Gj\. In Fig. [5] (black curve), we see that 
large returns \Gj | coincide with larger tick returns as Agj 
changes by a factor of two from very low aggregate re- 
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FIG. 3: (Color online) Black curve: average tick return Agj 
of ten Nasdaq stocks plotted against the rank of the corre- 
sponding aggregate return \Gj\, smoothed by averaging over 
100 intervals. Going from the smallest returns \Gj\ ~ to 
returns larger than 5 standard deviations, Agj increases by a 
factor of two. Light grey curve: after shuffling the tick returns 
for each stock, the same curve is only slightly increased for 
the largest aggregate returns, the effect is much smaller than 
for the original data. Blue curve (or dark grey): the simu- 
lation according to the statistical model discussed in section 
VI shows a similar behaviour as the empirical data, but in 
the simulation Agj is a little larger than the empirical one 
except for the largest \Gj\ where the simulated Agj is slightly 
smaller than the empirical one. 



turns to large returns of several standard deviations. In 
comparison with the largest tick returns 5g" lax+ shown 
in Fig.^ the change of a factor of two is similar, but the 
mean Agj is two to four times smaller than the largest 
tick returns. 

This finding can be explained by the presence of au- 
tocorrelations in the time series of 5gi, which can be il- 
lustrated when we shuffle the data for each stock by ex- 
changing each tick return with another randomly chosen 
tick return. The light grey curve in Fig. [3] shows that for 
shuffled data Agj increases only marginally for large ag- 
gregate returns, suggesting that autocorrelations of the 
tick returns have a strong influence on the mean tick re- 
turn size Agj. Indeed, we find that the absolute values 
\5gi\ of the tick return are long-range correlated in tick 
time with a correlation function decaying like Ai -016 for 
large time lags Ai = |ii — i 2 |, as shown in Fig.HJ If these 
correlations are destroyed by shuffling, in each interval of 
100 trades only a few large tick returns remain so that 
the average over these 100 tick returns approximates the 
global mean of all tick returns in the data set. 

In contrast, in the empirical unshuffled data correla- 
tions lead to intervals where many tick returns are large, 
so that the average tick return size is also large. The 
average tick return size Agj can well characterize the in- 
terval only because these autocorrelations exist. It turns 
out that the increase of Agj by a factor of two is the 
main effect where the original empirical data deviates 
significantly from shuffled data. Hence, we suggest that 
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FIG. 4: Autocorrelation function of the absolute value of the 
tick return \Sgi\ averaged over the data of ten Nasdaq stocks 
in 2002. The function shows a power law decay in tick time 
proportional to Ai -0 ' 16 for large Ai. 



fluctuations of the tick return size are responsible for the 
non-Gaussian fluctuations of the aggregate return. 

Using Eq. |5j we can estimate whether the change by a 
factor of two of the average tick return alone is enough 
to explain large aggregate returns Gj of more than five 
standard deviations. To this end, we focus on the inter- 
vals with the 50 largest aggregate returns ranging from 
approximately 4 to almost 8 standard deviations. Here, 
we find that Agj fluctuates between 0.14 and 0.35. As- 
suming uncorrelated returns, ANj should be of the order 
y/N w 10 if each trade would lead to a price change, but 
normal fluctuations could well lead to ANj twice as large 
as V~N, so that large tick returns together with fluctu- 
ations in the number difference could explain the large 
aggregate returns we find in our data set. 

Thus, we find that in intervals with 100 trades large 
| Gj | do not mainly depend on single extremely large tick 
returns. It rather turns out that correlations between the 
tick returns lead to large average tick returns Agj in an 
interval, and the fluctuations of Agj can account for the 
non-Gaussian distribution of the aggregate returns. 



IV. NUMBER DIFFERENCE 

The diffusion process of aggregate returns is not only 
influenced by the step width (i.e. the tick return size), 
but also by the direction of the steps. Therefore, we now 
analyze the influence of the number difference ANj in 
Eq. [SJ In order to treat positive and negative aggregate 
returns in the same analysis, it is useful to replace ANj 
by the sign-adapted number difference 



An, = sign(Gj)ANj 



(8) 



A positive value of Arij indicates that the price tends to 
move in one specific direction leading to an aggregate re- 
turn with the same sign. Arij can be negative if there are 
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FIG. 5: Density plot of the aggregate return \Gj\ against the 
difference Arij between the number of tick returns with the 
same and with the opposite direction as the aggregate return, 
for ten Nasdaq stocks. The points are coded from light grey 
to black indicating the number of events from 1 to more than 
600. A linear regression has a large correlation coefficient 
R 2 = 0.32. 



a few large tick returns determining the direction of the 
aggregate return, but also many small tick returns with 
the opposite direction which do not affect the aggregate 
return very much. Fig. shows a density plot of the 
aggregate return \Gj\ against the sign-adapted number 
difference Any. A linear regression yields an R 2 of 0.32, 
a large correlation coefficient confirming the visual im- 
pression that Arij and \Gj\ are strongly connected. We 
can also see that An.,- is mostly positive for large returns 
Gj, so that each large price change is accompanied by a 
certain sign-adapted number difference Arij. 

We now plot in Fig. [5] Arij against the rank according 
to \Gj\. We find that except for the largest (approxi- 
mately 15%) of the aggregate returns, Arij grows linearly 
with the rank while in Fig.^A^j remained almost con- 
stant in that region. For the largest ranks, An increases 
more rapidly, so that all in all the smoothed curve (aver- 
aged over 100 intervals) grows from to 18 between very 
small and extremely large aggregate returns. Thus, in in- 
tervals with very large returns there are approximately 18 
trades pushing the price in one direction (assuming that 
all other trades cancel each other out) , so that even with 
rather small tick returns this can lead to large returns in 
aggregation. Focusing on the 50 largest Gj , we find that 
Arij ranges from 4 to 41, clearly above the expected stan- 
dard deviation of 10 when assuming uncorrelated returns 
and rij — N. 

Thus, the fluctuations of An; around the mean value 
are crucial for getting large aggregate returns. The num- 
ber difference seems to be the main mechanism affecting 
the aggregate return since it changes much more drasti- 
cally than the tick return size when the aggregate return 
increases. On the other hand, when we compare the re- 
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FIG. 6: (Color online) Black curve: the sign-adapted number 
difference Arij is plotted against the rank according to the 
aggregate return \Gj \ for 10 Nasdaq stocks, smoothed by av- 
eraging over 100 intervals. Arij grows from zero to 18. The 
relation between Arij and the rank seems to be linear except 
for the largest 15% of the aggregate returns. A simulation 
(blue curve (or dark grey)) using a normal distribution for 
ANj leads to nearly the same dependance on the rank. For 
shuffled data (light grey curve), the curve is slightly flatter, 
but the difference is not large. 

suits to the analysis with shuffled data (light grey curve 
in Fig. 0, it turns out that this effect is very similar to 
what happens with random price changes. Hence, the 
basic movement of the aggregate return seems to depend 
mostly on the number difference, but the non-Gaussian 
large aggregate price changes only occur if the tick re- 
turns are large. 



V. MARKET ORDER SIGNS AND DIRECTION 
OF TICK RETURNS 

It is known that the signs of market orders are strongly 
correlated j^SEOl which means that there is a large prob- 
ability that a buy market order will be followed by an- 
other buy market order. Thus, it is probable that large 
number differences in the direction of tick returns are 
caused by large numbers of equally signed market or- 
ders. In order to analyze the relation between the num- 
ber difference and the market order flow, we define the 
difference An™ between the number n™ + of market or- 
ders with the same direction as Gj and the market orders 
with opposite direction n™~: 

Anf = n™+ - n™ - . (9) 

In Fig. [7| we plot the sign-adapted number difference 
Arij against the market order difference An™. We find a 
strong correlation between Arij and An™, a linear regres- 
sion yields a correlation coefficient R 2 of 0.29. However, 
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FIG. 7: Comparison between sign-adapted number difference 
Arij and market order difference An™ for ten Nasdaq stocks. 
The Points are coded from light grey to black indicating the 
number of events from 1 to more than 200. The correlation 
coefficient of a linear regression yields R 2 — 0.29, thus there 
is a strong connection between the two quantities. On the 
other hand, the events scatter widely so that small An are 
often linked with large An™ and vice versa. 



there are also large fluctuations suggesting that the num- 
ber difference is also due to order book dynamics, namely 
limit order placement and cancellation as well as asym- 
metries in the order book. A model for price formation 
due to these quantities was recently proposed by Mike 
and Farmer 
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VI. DISTRIBUTION OF AGGREGATE 
RETURNS AND A STATISTICAL MODEL 

In the first part of this paper, we analyzed the mech- 
anism leading to large aggregate returns and showed 
that the varying step width Agj accounts for the non- 
Gaussian behavior of the diffusion process of price move- 
ments. Now we want to use our results in a statistical 
model and reproduce the cumulative distribution func- 
tion of the absolute value of the aggregate return \Gj\. 

The model given by Eq. belongs to the well-known 
class of stochastic volatility models (see e.g. |42^ ) con- 
sisting of a noise term multiplied by a time-dependent 
volatility giving the magnitude of the fluctuations. In 
the present paper, the model is based on a microscopic 
description of the price process, so that we can fit the 
microscopic quantities determining the aggregate return 
in order to estimate the parameters of the model. In this 
approach the model is parameter-free in the sense that 
there are no parameters fitting the aggregate returns di- 
rectly, though we fit the distributions of its determinants 
like the step width Agj and the number difference ANj. 
We also discuss corrections to the model by including the 
tick return asymmetries according to Eq. \7\ 

We first analyze the distributions of Agj and ANj. 



FIG. 8: (Color online) Estimation of the parameters for 
the simulation (results shown as dotted lines) from empir- 
ical data for ten Nasdaq stocks, (a) The tail of the cu- 
mulative distribution of Agj (line) can be well fitted with 



P(x > A 9j 



*o)/Ag whgj-g ~ o.l2 is the average 



of all Agj and the parameters are a — 3.6 and xq — 0.094. 
For Agj < xo the limited tick size leads to a plateau, (b) 
The probability distribution of ANj (line) follows in good ap- 
proximation a normal distribution with mean 0.24 and stan- 
dard deviation 9.0. (c) As a rough approximation, the aver- 
age of the cumulative distribution of the positive (line) and 
negative (dashed line) values of Agf — Agj are parameter- 
ized proportional to two exponential functions e ~ a i,z x / A 3 for 
\AgJ - Agj \ ^ 0.1, with oi = 8.0 and a 2 = 4.8 (dashdotted 
line). The simulation (dotted line) uses the adapted oi = 9.0 
and a,2 = 2.0 in order to compensate the change in the distri- 
bution after taking into account (Ag^ — Ag~\ 
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Fig. Ufa) shows the cumulative distribution of Agj in a 
log- linear plot. The approximately straight line suggests 
that the tail follows an exponential distribution which 
can be well fitted with P(x > Agj) = e -«(*-*o)/A§ where 
Ag w 0.12 is the average of all Agj and the parameters 
are a = 3.6 and xq = 0.094. In the region of the smallest 
values of Agj < xq the limited tick sizes of the different 
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FIG. 9: (Color online) Cumulative distribution of the em- 
pirical aggregate return (circles) obtained from ten Nasdaq 
stocks in comparison with different simulations. A simulation 
of Eq.|S] (triangles) leads to a reasonable approximation of the 
empirical data, but it overestimates the probability of large 
returns. It becomes a little broader if we add the tick return 
asymmetry Agj — Agj according to Eq. |7| and simulate in- 
dependent quantities (diamonds). The simulation (squares) 
matches the empirical data very well if we incorporate corre- 
lations by generating Agj — Agj according to the conditional 
expectation value (Agj - &9]~) Ag . AN . ■ 



stocks lead to a plateau. In section IV we already found 
evidence that ANj behaves similarly to uncorrelated data 
since in Fig. [S] the shuffled data shows almost the same 
dependence on the rank of the corresponding \ Gj\. Figure 
Efb) shows that indeed ANj can be well described by a 
Gaussian noise with mean 0.24 and standard deviation 
9.0. 

In order to analyze the accuracy of the approximation 
given in Eq. |5j we simulate two independent time series 
according to the fitted functions for Agj and ANj and 
build the aggregate return Gj as the product of Agj and 
ANj. In Figure El we can compare the empirically found 
cumulative distribution of aggregate returns \Gj | (circles) 
to the results of this simulation (triangles). The simula- 
tion of Eq. [S] leads to a reasonable agreement with the 
actual aggregate return, but it overestimates the proba- 
bility of large aggregate returns. We note that the pa- 
rameters of the simulation are completely determined by 
the empirically found distributions of Agj and ANj, so 
that in this sense the simulation of \Gj\ has no free pa- 
rameters. 

In the following, we want to address the remaining de- 
viations of the simulation from the empirical data. Eq. 
gives an exact formula for Gj and provides a good pa- 
rameterization for the error term which reads 



G 3 - A 9j ANj 



2n i n j 



(Agj - Ag', 



(10) 



fluence on the aggregate return since it shows almost no 
dependence on the rank according to the aggregate re- 
turn. In the following, we thus approximate it by its av- 
erage value (jlnjnj jrij) = 28.7, so that the error term is 

determined by the asymmetries Ag^ — Agj in the mean 
tick return size. 

The cumulative distribution of Ag^ — Agj is shown 
in Fig. ISt c ) • The- main part of the distribution could be 
well fitted by an exponential function, but in the tail the 
distribution becomes broader. Thus, we add the term 
with Agj — Agj to our simulation by creating a third 
independent time series according to the empirical distri- 
bution of Agj~ — Agj . Fig. (diamonds) shows that this 
leads to an even broader distribution of the aggregate re- 
turn. Since the difference to the distribution according 
to Eq. [5] is small, the tick return asymmetry seems to 
have only a small influence on the aggregate return. 

A more accurate agreement with the empirical data 
can be obtained by taking into account correlations be- 
tween the quantities involved in the process. The correla- 
tion coefficients between them are shown in the following 
table where the correlations between the absolute values 
are shown in brackets: 





AN, 


Agj - Agj 


Agj ANj 


Agj 
ANj 
A 9j ANj 


-0.02 (-0.07) 
1 

0.95 (0.87) 


-0.01 (0.37) 
-0.35 (0.01) 
-0.41 (0.02) 


-0.01 (0.34) 
0.95 (0.87) 
1 



We find that the term 2n1~n- jrij has no systematic in- 



Agj and |AiVj| show slightly negative correlations which 
might suggest that people act more cautiously when large 
tick returns indicate a low liquidity. In these times, 
traders try not to place too many consecutive orders with 
the same sign because they know that it could lead to 
a large price change and increased trading costs. Fur- 
thermore, the strong anti-correlations between ANj and 
Ag~j — Agj also indicate cautious traders: If there are 
large asymmetries, so that e.g. the positive tick returns 
are much larger than the negative ones, people tend to 
use the higher liquidity in negative direction so that in 
these times they sell more often than they buy. For an 
analysis of the relation between liquidity imbalance and 
market efficiency, see e.g. 01 . The large correlations 
between Agj and \Ag~j~ — AgJ\ show that we can expect 
large variations of the tick return in positive and negative 
direction when the tick return is in general large. 

We now want to incorporate correlations in our simu- 
lation. The strongest non-trivial correlations appear be- 
tween Agj ANj and A<?+ — Agj including also some of 

the correlations between Ag~!j — Agj and Agj as well as 
ANj. However, it turns out that the conditional expec- 
tation value (Ag^~ — Agj) A ^ AN is non-linear, as seen 

in Fig. 1101 (circles) where it is plotted against Agj ANj. 
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FIG. 10: (Color online) Conditional expectation value 
(Agf - Ag]~) Ag , AJV . plotted against AgjANj (circles), ob- 
tained from the data of 10 Nasdaq stocks. A fit 
leads to (Agf - &9j) Ag . AN . ~ -0.0057 ■ sgn(AgjANj) ■ 

(AgjANj) 1 ' 9 (dashed line). The tick return asymmetry 
Agf — Agj is strongly correlated with the mean tick return 
size Agj and strongly anti-correlated with the number dif- 
ference ANj. Using the conditional expectation value in the 
simulation incorporates these correlations which allows for the 
reproduction of the distribution of aggregate returns. 



The function can be well fitted by — sgn(x)a|x|^ with 
a = 0.0057 and (3= 1.59 (dashed line). 

In order to incorporate this conditional expectation 
value into the simulation, we first create three inde- 
pendent time series for Agj, ANj, and Agf - Agj. 
Then, for each j we add the conditional expectation value 
(Agf - &9j) Ag . AN . to Ag+ - Agj, according to the 
value of AgjANj for that j. This method leads to a dif- 
ferent distribution for Agj — Agj than the initial one, 

so that we can not anymore generate Agj — Agj from 
the unconditional empirical distribution. As a rough ap- 
proximation, we parameterize this distribution by two 
exponential functions e ~ a i:2x/Ag £ Qr Ag+ — Agj «s 0.1. 
Then, we adapt the factors in the exponent in such a 
way that the resulting unconditional distribution fits the 
empirical one (a fit to the empirical distribution yields 
a\ — 8.0 and a,2 — 4.8, for the simulation we use the 
adapted a% = 9.0 and 02 = 2.0, compare Fig. 0c)). The 
resulting distribution of Gj docs not depend very much 
on the exact values of ai.2- 

The effect of the correlations represented by the con- 
ditional expectation value (Agf — ^9j) Ag AN . 1S vei T 
large and leads to a cumulative distribution of \Gj\ 
(squares in Fig. [5J very similar to the empirical one (cir- 
cles). It is worth noting that now the largest events are 
not anymore necessarily the ones with the largest values 
of AgjANj. Due to the anti-correlations expressed in 
(Agf - Agj) Ag _ AN ^, very large values of AgjANj can 

lead to relatively large values of Agj — Agj of the op- 



posite sign reducing the aggregate return. 

In addition to the distribution of the aggregate return, 
the simulation does also agree with other properties of 
the empirical data we found earlier in this paper. In 
Fig.[3]and[|j]we also plotted the data from the simulation 
against the rank according to the aggregate return \Gj\. 
For ANj the simulation matches the empirical data very 
well, while in Figure |31 we see that the simulated Agj 
shows the same dependence on the rank as the empirical 
data, but it is generally a little larger than the real one 
except for the largest aggregate returns, which might be 
due to the cutoff around 0.094 we used in the simulation 
of the distribution of Agj. We also find that the role of 
Agj — Agj in determining large aggregate returns is a 
little overestimated by our simulation, but the simulation 
covers the main features of the empirical data although 
we neglected many of the subtle relations between the 
different quantities. 



VII. DISCUSSION AND CONCLUSION 

Our results can be divided into two parts: First, we 
showed that the movement of stock prices in intervals 
with a constant number of trades can be understood as a 
diffusion process with a varying step width. Here, Gaus- 
sian fluctuations of the number difference determine the 
basic price movement, but the non-Gaussian large price 
changes are due to changes in the tick return size coin- 
ciding with large number differences at the same time. 
The large influence of the tick return size is caused by its 
autocorrelations assuring that in a 100 tick interval one 
can find many large tick returns so that the mean value of 
the tick return can be large. In such intervals, the price 
change in response to a trade is large, which is referred 
to as a period of low liquidity. Thus, the diffusion pro- 
cess of stock returns depends largely on fluctuations in 
the liquidity, in a gree ment with the findings of previous 

works mil m hi. 

In the second part of this paper, we found that the dis- 
tribution of aggregate returns can be reasonably approxi- 
mated by simulating the microscopic quantities mean tick 
return size and number difference according to their em- 
pirically found distributions. A more accurate agreement 
can be obtained by taking into account asymmetries in 
the tick return size in positive and negative direction as 
well as correlations between the different quantities. 

To conclude, we found evidence that price fluctuations 
in intervals with a constant number of trades can be de- 
scribed by a diffusion process with a varying step width. 
The long-term autocorrelations in the tick return size 
make sure that periods of low liquidity, where the price 
change due to a trade is large, last long enough to cause 
large aggregate returns in intervals with many trades. 
Our results suggests that the power law distribution of 
aggregate returns might not be universal but rather de- 
pends on a more complicated mechanism which is a com- 
bination of the dynamics of the trading frequency, the 



9 



dynamics of the step width and the Gaussian process of 
the step direction. 
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